
Climate change is one of the important issues that face the world in this technological era. The best proof of this situation is the historical temperature change (It is shown in left figure⁴). Our project investigates the reality of the increase in temperatures linked to industrial activities and the greenhouse effect. And before this investigation, the aim of this part enlight the significant sides of the temperature change data for each area. Before beginning the analysis, firstly, I want to figure out our problem, its cause and its effects in a short way.
"Climate change is a long-term change in the average weather patterns that have come to define Earth’s local, regional and global climates. Changes observed in Earth’s climate since the early 20th century are primarily driven by human activities, particularly fossil fuel burning, which increases heat-trapping greenhouse gas levels in Earth’s atmosphere, raising Earth’s average surface temperature. These human-produced temperature increases are commonly referred to as global warming.⁵"
According to NASA, Water Vapour, Carbon dioxide(CO₂), Methane, Nitrous oxide, Chlorofluorocarbons(CFCs) contribute to the greenhouse effect. Over the last century, human activities have increased concentrations of this natural greenhouse.
"Global climate change has already had observable effects on the environment. Glaciers have shrunk, ice on rivers and lakes is breaking up earlier, plant and animal ranges have shifted, and trees are flowering sooner. Effects that scientists had predicted in the past would result from global climate change: loss of sea ice accelerated sea-level rise and longer, more intense heat waves.⁶"
from IPython.display import HTML
In the first step, I determined what I curious about climate change in light of the above-mentioned information, and I wrote these down:
I will use python libraries within the Jupyter notebook environment for Investigation these questions. The main software libraries I’ll be importing are Pandas, NumPy for data wrangling and Matplotlib, Plotly for data visualization.
# import libraries
# data cleaning
import pandas as pd
import numpy as np
#data visualization
import matplotlib as mpl
import plotly.graph_objects as go
import plotly.express as px
import plotly.offline as pyo
import plotly.graph_objs as go
#visualazation libraries
import plotly.express as px
import plotly.offline as pyo
import plotly.graph_objs as go
# Set notebook mode to work in offline
pyo.init_notebook_mode()
The Global Surface Temperature Change data distributed by the National Aeronautics and Space Administration Goddard Institute for Space Studies (NASA-GISS) is publicly available. The FAOSTAT Temperature Change domain includes some extra information in this data set like Area names. Because in this project our key value is country names, I chose to use FAO's data set.
Data in the Temperature Change domain can be reachable from the Food and Agriculture Organization of the United Nations' web data portal². According to the license of FAO Statistical Database Terms of Use, the data set can be used for research, statistical, and scientific purposes. It can be access, download, create copies and re-disseminated datasets subject to these Dataset Terms.
The FAOSTAT Temperature Change domain disseminates statistics of mean surface temperature change by country, with annual updates. The current dissemination covers the period 1961–2019. Statistics are available for monthly, seasonal and annual mean temperature anomalies, i.e., temperature change with respect to a baseline climatology, corresponding to the period 1951–1980. The standard deviation of the temperature change of the baseline methodology is also available.² It includes areas of all the countries and territories of the world. (In 2019: 190 countries and 37 other territorial entities) The data covers monthly, seasonal, yearly temperature changes as Celsius degrees °C between 1961 and 2019. The frequency of dissemination and Release calendar of the data is the yearly base. The format is a comma-separated value (CSV) file, has the tabular format and 6.3 megabytes. It includes 9656 rows and 66 columns.
While the first seven columns include information about temperature changes, the other columns show temperature change numbers between 1961 and 2019. And, all years have some missing values. When I examine the first seven columns detail;
# Data set informations
# read data sets
df = pd.read_csv("Environment_Temperature_change_E_All_Data_NOFLAG.csv", encoding='latin-1') # csv file is encoding as latin-1 type
df_countrycode=pd.read_csv('FAOSTAT_data_11-24-2020.csv') #this csv file includes ISO-3 Country Code, this mentioned in Data Wrangling
#examine each column and find unique items of Months and Element columns
display(df.head(5))
print('\n')
#display(df.info())
print('\n')
print("Months")
display(df.Months.unique())
print('\n')
print("Elements")
display(df.Element.unique())
print('\n')
display(df_countrycode.head(5))
In this project, the other project data sets have country codes(ISO 3166 standard published by the International Organization for Standardizationⁱ ). Because of that, I added this from another CSV³ that is provided by FAOSTAT(the source of main data set). So, I added this ISO-3 codes in my data set as 'Country Code'. Additionally, I encountered a few minor text formatting issues with 'Months' and 'year'. Only a small amount of cleaning and wrangling was required to obtain my visualizations. My cleaning and wrangling activities included:
Data columns and some rows rename:
I changed the 'Area' columns to 'country names' because it will be key in merging the other project data sets. I changed seasons names in the months' category as:'Dec\x96Jan\x96Feb' → 'Winter', 'Mar\x96Apr\x96May' → 'Spring', 'Jun\x96Jul\x96Aug' → 'Summer','Sep\x96Oct\x96Nov'→'Fall'. Because year was written like 'Y1961', I splitted 'Y' and using the other part of 'year'.
Deleting of columns and some rows:
The 'Area Code', 'Months','Elemet Code' columns are a unique identifier of each row which does not provide any statistical information. And, each of the 'Unit' column show 'Celsius degrees °C' so I do not need this information as a column. Also, in this analysis I do not use 'Standart Deviation', so after filtering all 'Temperature change'rows, the 'Element' column also deleted.
Manipulating data frame:
As I mentioned above to make standart all data sets in the project, I added ISO-3 Country Code in my data. I merged 'Country Code' with my data set. Visualization and making analysis easier, I organized the data frames. I gathered all years' columns in the 'year' column.
#1. Renaming
df.rename(columns = {'Area':'Country Name'},inplace = True)
df.set_index('Months', inplace=True)
df.rename({'Dec\x96Jan\x96Feb': 'Winter', 'Mar\x96Apr\x96May': 'Spring', 'Jun\x96Jul\x96Aug':'Summer','Sep\x96Oct\x96Nov':'Fall'}, axis='index',inplace = True)
df.reset_index(inplace = True)
#2. Filtering
df = df[df['Element'] == 'Temperature change']
#2. Drop unwanted columns from df_countrycode
df_countrycode.drop(['Country Code','M49 Code','ISO2 Code','Start Year','End Year'],axis=1,inplace=True)
df_countrycode.rename(columns = {'Country':'Country Name','ISO3 Code':'Country Code'},inplace=True)
#3. Merging with df to df_country
df = pd.merge(df, df_countrycode, how='outer', on='Country Name')
#2. Drop unwanted columns
df.drop(['Area Code','Months Code','Element Code','Element','Unit'],axis=1,inplace=True)
#3.Channing dataframe organization
df = df.melt(id_vars=["Country Code", "Country Name","Months",], var_name="year", value_name="tem_change")
df["year"] = [i.split("Y")[-1] for i in df.year]
display(df.sample(5))
Once I have my data, understand its origins, and have loaded it into the database, I can explore it with queries. In this stage, my goal investigates missing values and answer my guiding questions. Before searching answer the guiding questions, I looked at the null values, and I know from the above information that only the temperature change column has some null values. As I expected, the old years have higher null values than new ones. In the first query, I looked at all missing values numbers for each year, and I found while 1961 has 719 missing values, 2019 has 385 missing values. Roughly, this decreasing trend is the same for other categories (in the yearly base, seasonal and monthly).
Why?
It has already known that Global warming has sped up during the last ten years⁷. And, I wanted to figure out which countries have been affected by this situation most.
Code explanation:
I wanted to show countries' names, and their average temperature change values belong to the last decade. So, I chose the last ten years as yearly and grouped these values according to countries. After these ordered descendingly and I limited the output to ten values.
Results:
Because the country names also cover the areas' names, the top ten list shows Europe and some European countries. It also has been illustrated that Europe is affected mostly by climate change with its neighbour Russian Federation. And not surprisingly, all countries on the list are industrialized countries, excluding 'Svalbard and Jan Mayen Islands.' This area is top of the list, and near Europe and Russia, also it is the arctic area, its natural life is in danger because of this reason⁸
df_c =df.copy()
df_c.set_index("year", inplace=True)
df_c = df_c.loc[['2010','2011','2012','2013','2014','2015','2016','2017','2018','2019']]
df_c.reset_index(inplace = True)
df_c = df_c.groupby(
['Country Name',]
).agg(
{
'tem_change':'mean',
}
)
df_c.reset_index(inplace = True)
df_c = df_c.sort_values(by=['tem_change'],ascending=False).head(10)
fig = px.bar(df_c, x="Country Name", y='tem_change' ,text='tem_change', title="Top ten countries that have highest temperature change in the last decades")
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
# adjusting size of graph, legend place, and background colour
fig.update_layout(
autosize=False,
width=1000,
height=600,
margin=dict(
l=50,
r=50,
b=100,
t=100,
pad=4
),
template='seaborn',
paper_bgcolor="rgb(234, 234, 242)",
legend=dict(
orientation="v",
yanchor="bottom",
y=0.3,
xanchor="left",
x=1.02
))
fig.update_xaxes( tickangle = 10,
title_text = "Countries",
title_font = {"size": 15},
title_standoff = 0)
fig.update_yaxes(showticklabels=False,tickmode="auto", title='Temperature Change',title_standoff = 0)
fig.show()
Why?
I wanted to discover the bottom countries with the lowest temperature change in the last decade as the reverse version of the above question.
Code explanation:
I wanted to show countries' names, and their average temperature change values belong to the last decade. So, I chose the last ten years as yearly and grouped these values according to countries. After these ordered ascendingly and I limited the output to ten values. Because the missing values affect the average, I did not choose null ones.
Results:
I think the surprising country on the list is India. Even if India is a developing country, it also has lots of industrial activities. This list shows that these activities do not have much effect on Temperature rising. And as I expected, there is no developed country on the list.
df_c =df.copy()
df_c.set_index("year", inplace=True)
df_c = df_c.loc[['2010','2011','2012','2013','2014','2015','2016','2017','2018','2019']]
df_c.reset_index(inplace = True)
df_c = df_c.groupby(
['Country Name',]
).agg(
{
'tem_change':'mean',
}
)
df_c.reset_index(inplace = True)
df_c = df_c.sort_values(by=['tem_change'],ascending=True).head(10)
fig = px.bar(df_c, x="Country Name", y='tem_change',text='tem_change' , title="Top ten countries that have lowest temperature change in the last decades")
fig.update_traces(texttemplate='%{text:.2f}', textposition='outside')
# adjusting size of graph, legend place, and background colour
fig.update_layout(
autosize=False,
width=1000,
height=600,
margin=dict(
l=50,
r=50,
b=100,
t=100,
pad=4
),
template='seaborn',
paper_bgcolor="rgb(234, 234, 242)",
legend=dict(
orientation="v",
yanchor="bottom",
y=0.3,
xanchor="left",
x=1.02
))
fig.update_xaxes( tickangle = 10,
title_text = "Countries",
title_font = {"size": 15},
title_standoff = 0)
fig.update_yaxes(showticklabels=False, title='Temperature Change')
fig.show()
Why?
In this question, I wanted to find industrialization effects with temperature change during the years.
While 'Annex I countries' means that industrialized countries, 'Non-Annex I countries' show mostly developing countries vulnerable to climate change's adverse impacts.⁹ Because of that, I used the temperature change data of 'Annex I countries,' 'Non-Annex I countries' and 'World' for comparison yearly.
Code explanation:
I made three different tables, including temperature change values yearly. Before converted as one table, I labelled each temperature change value with a related category ('World,' 'Annex I countries,' 'Non-Annex I countries'). I merged the first two tables; then, I merged the last table with this new one.
Results:
While I examined the graph and compared results, it is clearly seen while 'Annex I countries' has a relatively smooth line, 'Non-Annex I countries' has a more fluctuated line, and also, they have lots of peak points. So, it shows global warming has been speeding up for industrialized countries. On the other hand, as expected, World values have been between the other two categories. Finally, roughly, We can split up into ten-year periods according to 'Annex I countries' values because nearly every decade has a higher peak value than previous decades.
df0 = df[df['Months'] == 'Meteorological year'] # new data frame includes only yearly values
df1 = df0[df0['Country Name'] == 'World'] # from new data frame filtering World's data
df2 = df0[df0['Country Name'] == 'Annex I countries'] # from new data frame filtering'Annex I countries' data
df3 = df0[df0['Country Name'] == 'Non-Annex I countries'] # from new data frame filtering 'Non-Annex I countries' data
# Create traces
fig = go.Figure()
#create each categories
fig.add_trace(go.Scatter(x = df1.year, y=df1.tem_change,
mode='markers',
name='World'))
fig.add_trace(go.Scatter(x = df2.year , y=df2.tem_change,
mode='lines',
name='Annex I countries'))
fig.add_annotation(x='55',y=2.098,
xref="x", yref="y",
text="The hottest record",
showarrow=True,
font=dict(
family="Courier New, monospace",
size=16,
color="#ffffff"
),
align="center",
arrowhead=2,
arrowsize=1,
arrowwidth=2,
arrowcolor="#636363",
ax=20,
ay=-30,
bordercolor="#c7c7c7",
borderwidth=2,
borderpad=4,
bgcolor="#ff7f0e",
opacity=0.8
)
fig.add_trace(go.Scatter(x = df3.year , y=df3.tem_change,
mode='lines', name='Non-Annex I countries'))
# adjusting size of graph, legend place, and background colour
fig.update_layout(
autosize=False,
width=1000,
height=600,
margin=dict(
l=50,
r=50,
b=100,
t=100,
pad=4
),
template='seaborn',
paper_bgcolor="rgb(234, 234, 242)",
legend=dict(
orientation="h",
yanchor="bottom",
y=1.02,
xanchor="right",
x=1
))
fig.update_xaxes(type='category',title='Years')
fig.update_yaxes(title='Temperature Change')
fig.show()
#reference: https://plotly.com/python/line-and-scatter/
Why?
Like the above question, this one aims to examine the seasonal effects of climate change besides the yearly trend.
Code explanation:
I made four different tables for each season, including temperature change values. Before converted as one table, I labelled each temperature change value with a related category ('Winter,' 'Spring', 'Summer','Fall'). I merged the first two tables and last two tables; then, I merged these tables.
Results:
While I examined the graph and compared results, it is clearly seen while Summer has a relatively smooth line, Winter has a more fluctuated line, and also, they have lots of peak points. So, it illustrates that global warming effects are seen mostly in the winter season. While spring also has similar fluctuation to winter, fall has a similar trend to summer. Additionally, It is also seen that the world had the hottest winter in 2016.¹⁰
df0 = df[df['Country Name'] == 'World']
df1 = df0[df0['Months'] == 'Winter']
df2 = df0[df0['Months'] == 'Spring']
df3 = df0[df0['Months'] == 'Summer']
df4 = df0[df0['Months'] == 'Fall']
import plotly.graph_objects as go
# Create traces
fig = go.Figure()
fig.add_trace(go.Scatter(x = df1['year'], y=df1.tem_change,
mode='lines',
name='Winter'))
fig.add_trace(go.Scatter(x = df2['year'] , y=df2.tem_change,
mode='markers',
name='Spring'))
fig.add_trace(go.Scatter(x = df3['year'] , y=df3.tem_change,
mode='lines', name='Summer'))
fig.add_trace(go.Scatter(x = df4['year'] , y=df4.tem_change,
mode='markers', name='Fall'))
fig.add_annotation(x='55',y=2.165,
xref="x", yref="y",
text="The hottest winter",
showarrow=True,
font=dict(
family="Courier New, monospace",
size=16,
color="#ffffff"
),
align="center",
arrowhead=2,
arrowsize=1,
arrowwidth=2,
arrowcolor="#636363",
ax=20,
ay=-30,
bordercolor="#c7c7c7",
borderwidth=2,
borderpad=4,
bgcolor="#ff7f0e",
opacity=0.8
)
# adjusting size of graph, legend place, and background colour
fig.update_layout(
autosize=False,
width=1000,
height=600,
margin=dict(
l=50,
r=50,
b=100,
t=100,
pad=4
),
template='seaborn',
paper_bgcolor="rgb(234, 234, 242)",
legend=dict(
orientation="h",
yanchor="bottom",
y=1.02,
xanchor="right",
x=1
))
fig.update_xaxes(type='category',title='Years')
fig.update_yaxes(title='Temperature Change')
fig.show()
Why?
I wanted to investigate how many historical records had in this decade to learn if global warming more rapid last decade.
Code explanation:
I selected 'World' records from 'Country Name' column. Then, I chose only monthly basis temperature change values and whole world records.
Results:
Result shows that already eight of the ten years in the current decade (2010–2019) were among the ten hottest years on record in terms of mean annual temperatures. Additionally, Radar chart clearly shows how temperature change increased day by day.
df0 = df[df['Country Name'] == 'World']
df0.set_index("Months", inplace=True)
df0 = df0.loc[['January', 'February', 'March', 'April', 'May', 'June', 'July','August', 'September', 'October', 'November', 'December' ]]
df0.reset_index(inplace = True)
fig = px.line_polar(df0, r=df0.tem_change, theta=df0.Months,animation_frame='year', line_close=True)
fig.update_layout(
polar=dict(
radialaxis=dict(
visible=True,
range=[-0.5, 3]
)),
autosize=False,
width=1000,
height=600,
margin=dict(
l=50,
r=50,
b=100,
t=100,
pad=4
),
template='seaborn',
paper_bgcolor="rgb(234, 234, 242)",
legend=dict(
orientation="h",
yanchor="bottom",
y=1.02,
xanchor="right",
x=1
))
fig.show()
In this section of the project, I examined how global surface temperature change between 1961 and 2019. According to my guiding question answers, when examining the top ten areas that have the highest temperature change in the last decade are mostly industrialized countries. Additionally, I found that temperature increased every ten decades, and the last decade can count as the hottest decade. From our above analysis result, I came up winter season getting more hotter. Finally, I tried to show how temperature is increasing worldwide as a proof of global warming. In the next section with these and the other outputs, we will analyze deeper climate change effects. As a visualization of these results, I added an animation map; this shows how climate change gets serious year by year. For the map, I added one more column in a different data frame; this column categorized each temperature change with half-point intervals. And, I used warmer colours for zero and above, colder colour zero and below.
df_tem_change = df.copy() # do not lose 'df'
df_tem_change = df_tem_change[df_tem_change['Months'] == 'Meteorological year'] # choose just year data
df_tem_change.drop(['Months'],axis=1,inplace=True) # dropped Months column
df_tem_change.to_csv(r'./Temperature_change_Data.csv',index=False) # export data to share with the project group members
df_map = df.copy() # do not lose 'df'
df_map = df_map[df_map['Months'] == 'Meteorological year'] # chose yearly base data
df_map['°C'] = ['<=-1.5' if x<=(-1.5) else '<=-1.0' if (-1.5)<x<=(-1.0) else '<=0.0' if (-1.0)<x<=0.0 else '<=0.5' if 0.0<x<=0.5 else '<=1.5' if 0.5<x<=1.5 else '>1.5' if 1.5<=x<10 else 'None' for x in df_map['tem_change']]
# categorized each of temperature changes
fig = px.choropleth(df_map, locations="Country Code", # used plotly express choropleth for animation plot
color="°C",
locationmode='ISO-3',
hover_name="Country Name",
hover_data=['tem_change'],
animation_frame =df_map.year,
labels={'tem_change':'The Temperature Change', '°C':'°C'},
category_orders={'°C':['<=-1.5','<=-1.0','<=0.0','<=0.5','<=1.5','>1.5','None']},
color_discrete_map={'<=-1.5':"#08519c",'<=-1.0':"#9ecae1",'<=0.0':"#eff3ff",'<=0.5':"#ffffb2",'<=1.5': "#fd8d3c",'>1.5':"#bd0026",'None':"#252525"},
title = 'Temperature Change - 1961 - 2019')
# adjusting size of map, legend place, and background colour
fig.update_layout(
autosize=False,
width=1200,
height=600,
margin=dict(
l=50,
r=50,
b=100,
t=100,
pad=4
),
template='seaborn',
paper_bgcolor="rgb(234, 234, 242)",
legend=dict(
orientation="v",
yanchor="auto",
y=1.02,
xanchor="right",
x=1
))
fig.show()